Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise.
\[ p(\theta | D) = \frac{p(D | \theta) \times p(\theta)}{p(D)}\quad \textrm{posterior} = \frac{\textrm{likelihood} \times \textrm{prior}}{\textrm{marginal likelihood}}\]
Likelihood (Sunnaker et al. 2013):
“probability of the observed data under a particular statistical model”
“quantifies the support data lend to particular values of parameters”
What’s the probability of getting a head for coin (\(p\)) that has been observed to come up heads three times (\(X = 3\)) out of nine tosses (\(n = 9\))?
\[X \sim \mathrm{Binomial}(n, p)\qquad \mathrm{Pr}(X = x; n, p) = \binom{n}{x}p^x(1-p)^{n-x}\]
\[\begin{aligned} \ell(p) &= \mathrm{Pr}(X = 3; n = 9, p = ?)\\ &= \frac{9!}{3!(9-3)!}p^3(1-p)^{9-3}\\ &= 84p^3(1-p)^6 \end{aligned}\]We know the likelihood function from our knowledge of the underlying data generating process, e.g., binomial.
What if we don’t have the likelihood function?
What if we’re not sure how the observed data supports particular values of the parameter we’re trying to discover?
What if our likelihood function is hard to write down explicitly or computationally expensive to evaluate?
simulations of the temperature map of the CMB
large-scale structure of galaxy distributions
mass and luminosity distributions for stars and galaxies
Q: Can we sample the posterior without evaluating the likelihood?
YES! Maybe
ABC replaces the calculation of the likelihood function with simulation.
Is the simulated data close to the observed data?
\[\rho(\hat{D},D ) \le \varepsilon\]
Common distances measures include:
This is a tricky choice which heavily affects computation time.
Ratio of posterior distributions gives an indication of which model is better supported by the data.
\[\frac{p(M_1|D)}{p(M_2|D)} = \frac{p(D|M_1) p(M_1)}{p(D|M_2) p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}\]
\(B_{1,2}\) is known as the Bayes Factor.
Bias due to non-zero value for \(\varepsilon\)
Many researcher degrees of freedom:
Curse of Dimensionality
A method to circumvent intractable or ill-behaved likelihood functions.
Computationally more expensive than standard Bayesian samplers.
Choice of the distance function \(\rho()\) and tolerance thresholds \(\varepsilon\) needs careful attention
Not yet feasible for high dimensional problems (work is progress!)